AITopics

2506.00934

Country:

Europe (0.46)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Marinoni, Christian, Gramaccioni, Riccardo Fosco, Chen, Changan, Uncini, Aurelio, Comminiello, Danilo

Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality

arXiv.org Artificial IntelligenceFeb-14-2024

The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing, with a specific emphasis on 3D speech enhancement and 3D Sound Event Localization and Detection in Extended Reality applications. As part of our latest competition, we provide a brand-new dataset, which maintains the same general characteristics of the L3DAS21 and L3DAS22 datasets, but with first-order Ambisonics recordings from multiple reverberant simulated environments. Moreover, we start exploring an audio-visual scenario by providing images of these environments, as perceived by the different microphone positions and orientations. We also propose updated baseline models for both tasks that can now support audio-image couples as input and a supporting API to replicate our results. Finally, we present the results of the participants. Further details about the challenge are available at https://www.l3das.com/icassp2023.

microphone, simulated environment, task 2, (15 more...)

2402.09245

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Europe > Italy > Lazio > Rome (0.05)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Neural Information Processing SystemsApr-6-2023, 15:32:17 GMT

Efficient Unsupervised Learning for Localization and Detection in Object Categories

We describe a novel method for learning templates for recognition and localization of objects drawn from categories. A generative model repre- sents the configuration of multiple object parts with respect to an object coordinate system; these parts in turn generate image features. The com- plexity of the model in the number of features is low, meaning our model is much more efficient to train than comparative methods. Moreover, a variational approximation is introduced that allows learning to be or- ders of magnitude faster than previous approaches while incorporating many more features. Our model has been carefully tested on standard datasets; we compare with a number of recent template models.

efficient unsupervised learning, localization and detection, object category

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

Grassucci, Eleonora, Mancini, Gioia, Brignone, Christian, Uncini, Aurelio, Comminiello, Danilo

Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

arXiv.org Artificial IntelligenceDec-14-2022

Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of Ambisonics microphones, each comprising four capsules that decompose the sound field in spherical harmonics. In this paper, we propose a dual quaternion representation of the spatial sound field acquired through an array of two First Order Ambisonics (FOA) microphones. The audio signals are encapsulated in a dual quaternion that leverages quaternion algebra properties to exploit correlations among them. This augmented representation with 6 degrees of freedom (6DOF) involves a more accurate coverage of the sound field, resulting in a more precise sound localization and a more immersive audio experience. We evaluate our approach on a sound event localization and detection (SELD) benchmark. We show that our dual quaternion SELD model with temporal convolution blocks (DualQSELD-TCN) achieves better results with respect to real and quaternion-valued baselines thanks to our augmented representation of the sound field. Full code is available at: https://github.com/ispamm/DualQSELD-TCN.

artificial intelligence, human computer interaction, machine learning, (18 more...)

2204.01851

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Human Computer Interaction (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceFeb-21-2022

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

Guizzo, Eric, Marinoni, Christian, Pennese, Marco, Ren, Xinlei, Zheng, Xiguang, Zhang, Chen, Masiero, Bruno, Uncini, Aurelio, Comminiello, Danilo

The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points and adding constrains that improve the baseline model's efficiency and overcome the major difficulties encountered by the participants of the previous challenge. We updated the baseline model of Task 1, using the architecture that ranked first in the previous challenge edition. We wrote a new supporting API, improving its clarity and ease-of-use. In the end, we present and discuss the results submitted by all participants. L3DAS22 Challenge website: www.l3das.com/icassp2022.

artificial intelligence, deep learning, machine learning, (19 more...)

doi: 10.1109/ICASSP43922.2022.9746872

2202.10372

Country:

South America > Brazil > São Paulo > Campinas (0.04)
Oceania > Australia > Queensland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceApr-29-2021

L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing

Guizzo, Eric, Gramaccioni, Riccardo F., Jamili, Saeid, Marinoni, Christian, Massaro, Edoardo, Medaglia, Claudia, Nachira, Giuseppe, Nucciarelli, Leonardo, Paglialunga, Ludovica, Pennese, Marco, Pepe, Sveva, Rocchi, Enrico, Uncini, Aurelio, Comminiello, Danilo

The L3DAS21 Challenge is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage. Usually, machine learning approaches to 3D audio tasks are based on single-perspective Ambisonics recordings or on arrays of single-capsule microphones. We propose, instead, a novel multichannel audio configuration based multiple-source and multiple-perspective Ambisonics recordings, performed with an array of two first-order Ambisonics microphones. To the best of our knowledge, it is the first time that a dual-mic Ambisonics configuration is used for these tasks. We provide baseline models and results for both tasks, obtained with state-of-the-art architectures: FaSNet for SE and SELDNet for SELD. This report is aimed at providing all needed information to participate in the L3DAS21 Challenge, illustrating the details of the L3DAS21 dataset, the challenge tasks and the baseline models.

artificial intelligence, deep learning, machine learning, (17 more...)

doi: 10.1109/MLSP52302.2021.9596248

2104.05499

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Perez-Lopez, Andres, Fonseca, Eduardo, Serra, Xavier

A hybrid parametric-deep learning approach for sound event localization and detection

arXiv.org Machine LearningAug-27-2019

This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge. The proposed methodology relies on parametric spatial audio analysis for source localization and detection, combined with a deep learning-based monophonic event classifier. The evaluation of the proposed algorithm yields overall results comparable to the baseline system. The main highlight is a reduction of the localization error on the evaluation dataset by a factor of 2.6, compared with the baseline performance.

artificial intelligence, estimation, machine learning, (13 more...)

arXiv.org Machine Learning

1908.10133

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Loeff, Nicolas, Arora, Himanshu, Sorokin, Alexander, Forsyth, David

Efficient Unsupervised Learning for Localization and Detection in Object Categories

Neural Information Processing SystemsDec-31-2006

We describe a novel method for learning templates for recognition and localization of objects drawn from categories. A generative model represents the configuration of multiple object parts with respect to an object coordinate system; these parts in turn generate image features. The complexity of the model in the number of features is low, meaning our model is much more efficient to train than comparative methods. Moreover, a variational approximation is introduced that allows learning to be orders of magnitude faster than previous approaches while incorporating many more features.

approximation, localization, object center, (15 more...)

Country:

North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.51)

Loeff, Nicolas, Arora, Himanshu, Sorokin, Alexander, Forsyth, David

Efficient Unsupervised Learning for Localization and Detection in Object Categories

Neural Information Processing SystemsDec-31-2006

approximation, localization, object center, (15 more...)

Country:

North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.51)

Loeff, Nicolas, Arora, Himanshu, Sorokin, Alexander, Forsyth, David

Efficient Unsupervised Learning for Localization and Detection in Object Categories

Neural Information Processing SystemsDec-31-2006

Desirable Characteristics of a model include good representation of objects, fast and efficient learning algorithms that require as little supervised information as possible.

localization, machine learning, object-oriented architecture, (18 more...)

Country:

North America > United States (0.46)
Europe (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.52)